nxG_extractfeature_descStats extracted from graphs¶

DataFrame.describe(percentiles=None, include=None, exclude=None)[source] Generates descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.

Parameters:
percentiles : list-like of numbers, optional The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles. include : ‘all’, list-like of dtypes or None (default), optional A white list of data types to include in the result. Ignored for Series. Here are the options: ‘all’ : All columns of the input will be included in the output.

A list-like of dtypes : Limits the results to the provided data types. To limit the result to numeric types submit numpy.number. To limit it instead to object columns submit the numpy.object data type. Strings can also be used in the style of select_dtypes (e.g. df.describe(include=['O'])). To select pandas categorical columns, use 'category'

None (default) : The result will include all numeric columns.

exclude : list-like of dtypes or None (default), optional,

A black list of data types to omit from the result. Ignored for Series. Here are the options:

A list-like of dtypes : Excludes the provided data types from the result. To exclude numeric types submit numpy.number. To exclude object columns submit the data type numpy.object. Strings can also be used in the style of select_dtypes (e.g. df.describe(include=['O'])). To exclude pandas categorical columns, use 'category' None (default) : The result will exclude nothing.

Returns:
summary: Series/DataFrame of summary statistics

source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html

Notes¶

For numeric data, the result’s index will include count, mean, std, min, max as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50 percentile is the same as the median.

For object data (e.g. strings or timestamps), the result’s index will include count, unique, top, and freq. The top is the most common value. The freq is the most common value’s frequency. Timestamps also include the first and last items.

If multiple object values have the highest count, then the count and top results will be arbitrarily chosen from among those with the highest count.

For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns. If the dataframe consists only of object and categorical data without any numeric columns, the default is to return an analysis of both the object and categorical columns. If include='all' is provided as an option, the result will include a union of attributes of each type.

The include and exclude parameters can be used to limit which columns in a DataFrame are analyzed for the output. The parameters are ignored when analyzing a Series.

In [1]:
# returning 18 variables
# discrall_Degree.columns
'''
Index([u'count', u'mean', u'std', u'min', u'25%', u'50%', u'75%', u'max',
   u'skew', u'kurt', u'var', u'cumsummean', u'cumsumstd', u'cumsummin',
   u'cumsum25%', u'cumsum50%', u'cumsum75%', u'cumsummax']
'''
Out[1]:
"\nIndex([u'count', u'mean', u'std', u'min', u'25%', u'50%', u'75%', u'max',\n   u'skew', u'kurt', u'var', u'cumsummean', u'cumsumstd', u'cumsummin',\n   u'cumsum25%', u'cumsum50%', u'cumsum75%', u'cumsummax']\n"

SAE: Autoencoder trainig and validation¶

from NME_DEC/train_sae_wimgfeatures_descStats.py¶

In [2]:
import sys
import os
import mxnet as mx
import numpy as np
import pandas as pd
import data
from scipy.spatial.distance import cdist
from sklearn.cluster import KMeans
import model
from autoencoder import AutoEncoderModel
from solver import Solver, Monitor
import logging
import sklearn
from sklearn.manifold import TSNE
from utilities import *
try:
   import cPickle as pickle
except:
   import pickle
import gzip

# for visualization
from sklearn.manifold import TSNE
from utilities import *
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
In [3]:
NME_nxgraphs = r'Z:\Cristina\Section3\paper_notes_section3_MODIFIED\datasets'
    
allNMEs_dynamic = pd.read_csv(os.path.join(NME_nxgraphs,'dyn_roi_records_allNMEs_descStats.csv'), index_col=0)

allNMEs_morphology = pd.read_csv(os.path.join(NME_nxgraphs,'morpho_roi_records_allNMEs_descStats.csv'), index_col=0)

allNMEs_texture = pd.read_csv(os.path.join(NME_nxgraphs,'text_roi_records_allNMEs_descStats.csv'), index_col=0)

allNMEs_stage1 = pd.read_csv(os.path.join(NME_nxgraphs,'stage1_roi_records_allNMEs_descStats.csv'), index_col=0)

# to load SERw matrices for all lesions
with gzip.open(os.path.join(NME_nxgraphs,'nxGdatafeatures_allNMEs_descStats.pklz'), 'rb') as fin:
    nxGdatafeatures = pickle.load(fin)

# to load discrall_dict dict for all lesions
with gzip.open(os.path.join(NME_nxgraphs,'nxGnormfeatures_allNMEs_descStats.pklz'), 'rb') as fin:
    discrall_dict_allNMEs = pickle.load(fin)           

#########
# shape input (798L, 427L)     
nxGdiscfeatures = discrall_dict_allNMEs   
print('Loading {} leasions with nxGdiscfeatures of size = {}'.format(nxGdiscfeatures.shape[0], nxGdiscfeatures.shape[1]) )

print('Normalizing dynamic {} leasions with features of size = {}'.format(allNMEs_dynamic.shape[0], allNMEs_dynamic.shape[1]))
normdynamic = (allNMEs_dynamic - allNMEs_dynamic.mean(axis=0)) / allNMEs_dynamic.std(axis=0)
normdynamic.mean(axis=0)

print('Normalizing morphology {} leasions with features of size = {}'.format(allNMEs_morphology.shape[0], allNMEs_morphology.shape[1]))
normorpho = (allNMEs_morphology - allNMEs_morphology.mean(axis=0)) / allNMEs_morphology.std(axis=0)
normorpho.mean(axis=0)

print('Normalizing texture {} leasions with features of size = {}'.format(allNMEs_texture.shape[0], allNMEs_texture.shape[1]))
normtext = (allNMEs_texture - allNMEs_texture.mean(axis=0)) / allNMEs_texture.std(axis=0)
normtext.mean(axis=0)

print('Normalizing stage1 {} leasions with features of size = {}'.format(allNMEs_stage1.shape[0], allNMEs_stage1.shape[1]))
normstage1 = (allNMEs_stage1 - allNMEs_stage1.mean(axis=0)) / allNMEs_stage1.std(axis=0)
normstage1.mean(axis=0)

# shape input (798L, 427L)    
combX_allNME = np.concatenate((nxGdiscfeatures, normdynamic.as_matrix(), normorpho.as_matrix(), normtext.as_matrix(), normstage1.as_matrix()), axis=1)       
YnxG_allNME = np.asarray([nxGdatafeatures['roi_id'].values,
        nxGdatafeatures['classNME'].values,
        nxGdatafeatures['nme_dist'].values,
        nxGdatafeatures['nme_int'].values])

print('Loading {} all NME of size = {}'.format(combX_allNME.shape[0], combX_allNME.shape[1]) )
print('Loading all NME lables [label,BIRADS,dist,enh] of size = {}'.format(YnxG_allNME[0].shape[0])   )

# define variables for DEC 
roi_labels = YnxG_allNME[1]  
roi_labels = ['K' if rl=='U' else rl for rl in roi_labels]
Loading 792 leasions with nxGdiscfeatures of size = 326
Normalizing dynamic 792 leasions with features of size = 34
Normalizing morphology 792 leasions with features of size = 19
Normalizing texture 792 leasions with features of size = 44
Normalizing stage1 792 leasions with features of size = 100
Loading 792 all NME of size = 523
Loading all NME lables [label,BIRADS,dist,enh] of size = 792
In [4]:
######################
## From Pre-train/fine tune the SAE
######################
save_to = r'Z:\Cristina\Section3\paper_notes_section3_MODIFIED\save_to\SAEmodels'
input_size = combX_allNME.shape[1]
latent_size = [input_size/rxf for rxf in [25,15,10,5,2]]

# train/test splits (test is 10% of labeled data)
sep = int(combX_allNME.shape[0]*0.10)
X_val = combX_allNME[:sep]
y_val = YnxG_allNME[1][:sep]
X_train = combX_allNME[sep:]
y_train = YnxG_allNME[1][sep:]
batch_size = 125 # 160 32*5 = update_interval*5
X_val[np.isnan(X_val)] = 0.00001

allAutoencoders = []
for output_size in latent_size:
    # Train or Read autoencoder: interested in encoding/decoding the input nxg features into LD latent space        
    # optimized for clustering with DEC
    xpu = mx.cpu()
    ae_model = AutoEncoderModel(xpu, [X_train.shape[1],500,500,2000,output_size], pt_dropout=0.2)
    ##  After Pre-train and finetuuning on X_train
    ae_model.load( os.path.join(save_to,'SAE_zsize{}_wimgfeatures_descStats_zeromean.arg'.format(str(output_size))) ) 

    ##  Get train/valid error (for Generalization)
    print "Autoencoder Training error: %f"%ae_model.eval(X_train)
    print "Autoencoder Validation error: %f"%ae_model.eval(X_val)
    # put useful metrics in a dict
    outdict = {'Training set': ae_model.eval(X_train),
               'Testing set': ae_model.eval(X_val),
               'output_size': output_size,
               'sep': sep}

    allAutoencoders.append(outdict)
Autoencoder Training error: 0.000937
Autoencoder Validation error: 0.060106
Autoencoder Training error: 0.000890
Autoencoder Validation error: 0.042340
Autoencoder Training error: 0.000910
Autoencoder Validation error: 0.027583
Autoencoder Training error: 0.000802
Autoencoder Validation error: 0.017948
Autoencoder Training error: 0.000583
Autoencoder Validation error: 0.016800
In [5]:
######################
## Visualize the reconstructed inputs and the encoded representations.
######################
# train/test loss value o
dfSAE_perf = pd.DataFrame()
for SAE_perf in allAutoencoders:
    dfSAE_perf = dfSAE_perf.append( pd.DataFrame({'Reconstruction Error': pd.Series(SAE_perf)[0:2], 'train/validation':pd.Series(SAE_perf)[0:2].index, 'compressed size': SAE_perf['output_size']}) ) 

sns.set_style("darkgrid")
f, ax = plt.subplots(figsize=(10, 4))
sns.set_color_codes("pastel")
axSAE_perf = sns.pointplot(x="compressed size", y="Reconstruction Error", hue="train/validation", data=dfSAE_perf,  
                           markers=["x","o"], linestyles=["--","-"])  
aa=ax.set_xticklabels(['20x',"15x","10x","5x","2x"],fontsize=12)
ax.set_xlabel('compresed size',fontsize=14)
ax.set_ylabel('mean Reconstruccion Loss',fontsize=14)
ax.legend(loc="upper right",fontsize=15)
Out[5]:
<matplotlib.legend.Legend at 0x5948c3c8>
#¶

Unsupervised learning in optimal LD space: Fitting aN MLP DUAL OPTIMIZATION¶

#¶
In [6]:
##################################################################
##  Unsupervised learning in optimal LD space: Fitting aN MLP DUAL OPTIMIZATION
################################################################## 
from decModel_wimgF_dualopt_descStats import *
labeltype = 'wimgF_dualopt_descStats_saveparams' 
save_to = r'Z:\Cristina\Section3\paper_notes_section3_MODIFIED\save_to'
#r'Z:\Cristina\Section3\NME_DEC\SAEmodels\decModel_wimgF_dualopt_descStats_saveparams'

# to load a prevously DEC model  
input_size = combX_allNME.shape[1]
latent_size = [input_size/rxf for rxf in [15,10,5,2]]
varying_mu = [int(np.round(var_mu)) for var_mu in np.linspace(3,12,10)]

scoresM = np.zeros((len(latent_size),len(varying_mu),5))
scoresM_titles=[]

sns.set_color_codes("pastel")

######################
# DEC: define num_centers according to clustering variable
######################   
# to save all hyperparapsm
cvorigXAUC = []; 
cvZspaceAUC_cvtrain = []; 
cvZspace_stdAUC_cvtrain = [];
cvZspaceAUC_cvVal = []; 
cvZspace_stdAUC_cvVal = [];
TestAUC = []; 

for ik,znum in enumerate(latent_size):
    to_plotcvOrigXAUC = []
    to_plotinitAUC = []
    to_plotcvZspaceAUC_cvtrain = []
    to_plotcvZspaceAUC_cvVal = []
    to_plotTestAUC = []
    for ic,num_centers in enumerate(varying_mu): 
        X = combX_allNME
        y = roi_labels
        y_train_roi_labels = np.asarray(y)

        print('Loading autoencoder of znum = {}, mu = {} , post training DEC results'.format(znum,num_centers))
        dec_model = DECModel(mx.cpu(), X, num_centers, 1.0, znum, 'Z:\\Cristina\\Section3\\paper_notes_section3_MODIFIED\\save_to\\SAEmodels') 

        with gzip.open(os.path.join(save_to,'dec_model_z{}_mu{}_{}.arg'.format(znum,num_centers,labeltype)), 'rb') as fu:
            dec_model = pickle.load(fu)
          
        with gzip.open(os.path.join(save_to,'outdict_z{}_mu{}_{}.arg'.format(znum,num_centers,labeltype)), 'rb') as fu:
            outdict = pickle.load(fu)
        
        print('DEC train init AUC = {}'.format(outdict['meanAuc_cv'][0]))
        max_meanAuc_cv = outdict['meanAuc_cv'][-1]
        indmax_meanAuc_cv = outdict['meanAuc_cv'].index(max_meanAuc_cv)
        print r'DEC train max meanAuc_cv = {} $\pm$ {}'.format(max_meanAuc_cv,dec_model['std_auc'][indmax_meanAuc_cv])
        print('DEC validation AUC at max meanAuc_cv = {}'.format(outdict['auc_val'][indmax_meanAuc_cv]))
        
        #####################
        # extract Z-space from optimal DEC model
        #####################
        # saved output results
        dec_args_keys = ['encoder_1_bias', 'encoder_3_weight', 'encoder_0_weight', 
        'encoder_0_bias', 'encoder_2_weight', 'encoder_1_weight', 
        'encoder_3_bias', 'encoder_2_bias']
        dec_args = {key: v for key, v in dec_model.items() if key in dec_args_keys}
        dec_args['dec_mubestacci'] = dec_model['dec_mu']
        
        N = X.shape[0]
        all_iter = mx.io.NDArrayIter({'data': X}, batch_size=X.shape[0], shuffle=False,
                                                  last_batch_handle='pad')   
        ## extract embedded point zi 
        mxdec_args = {key: mx.nd.array(v) for key, v in dec_args.items() if key != 'dec_mubestacci'}                           
        aDEC = DECModel(mx.cpu(), X, num_centers, 1.0, znum, 'Z:\\Cristina\\Section3\\paper_notes_section3_MODIFIED\\save_to\\SAEmodels') 
        
        # organize weights and biases
        l1=[v.asnumpy().shape for k,v in aDEC.ae_model.args.iteritems()]
        k1=[k for k,v in aDEC.ae_model.args.iteritems()]
        l2=[v.asnumpy().shape for k,v in mxdec_args.iteritems()]
        k2=[k for k,v in mxdec_args.iteritems()]

        for ikparam,sizeparam in enumerate(l1):
            for jkparam,savedparam in enumerate(l2):
                if(sizeparam == savedparam):
                    #print('updating layer parameters: {}'.format(savedparam))
                    aDEC.ae_model.args[k1[ikparam]] = mxdec_args[k2[jkparam]]

        zbestacci = model.extract_feature(aDEC.feature, mxdec_args, None, all_iter, X.shape[0], aDEC.xpu).values()[0]      

        # compute model-based best-pbestacci or dec_model['pbestacci']
        pbestacci = np.zeros((zbestacci.shape[0], dec_model['num_centers']))
        aDEC.dec_op.forward([zbestacci, dec_args['dec_mubestacci'].asnumpy()], [pbestacci])
        #pbestacci = dec_model['pbestacci']
        
        # pool Z-space variables
        datalabels = np.asarray(y)
        dataZspace = np.concatenate((zbestacci, pbestacci), axis=1) 

        #####################
        # unbiased assessment: SPlit train/held-out test
        #####################
        # to compare performance need to discard unkown labels, only use known labels (only B or M)
        Z = dataZspace[datalabels!='K',:]
        y = datalabels[datalabels!='K']
      
        print '\n... MLP fully coneected layer trained on Z_train tested on Z_test' 
        sep = int(X.shape[0]*0.10)
        Z_test = Z[:sep]
        yZ_test = np.asanyarray(y[:sep]=='M').astype(int) 
        Z_train = Z[sep:]
        yZ_train = np.asanyarray(y[sep:]=='M').astype(int) 
       
        # We’ll load MLP using MXNet’s symbolic interface
        dataMLP = mx.sym.Variable('data')
        # MLP: two fully connected layers with 128 and 32 neurons each. 
        fc1  = mx.sym.FullyConnected(data=dataMLP, num_hidden = 128)
        act1 = mx.sym.Activation(data=fc1, act_type="relu")
        fc2  = mx.sym.FullyConnected(data=act1, num_hidden = 32)
        act2 = mx.sym.Activation(data=fc2, act_type="relu")
        # data has 2 classes
        fc3  = mx.sym.FullyConnected(data=act2, num_hidden=2)
        # Softmax output layer
        mlp  = mx.sym.SoftmaxOutput(data=fc3, name='softmax')
        # create a trainable module on CPU     
        batch_size = 50
        mlp_model = mx.mod.Module(symbol=mlp, context=mx.cpu())
        # pass train/test data to allocate model (bind state)
        MLP_train_iter = mx.io.NDArrayIter(Z_train, yZ_train, batch_size, shuffle=False)
        mlp_model.bind(MLP_train_iter.provide_data, MLP_train_iter.provide_label)
        mlp_model.init_params()   
        mlp_model.init_optimizer()
        mlp_model_params = mlp_model.get_params()[0]
        
        # update parameters based on optimal found during cv Training
        from mxnet import ndarray
        params_dict = ndarray.load(os.path.join(save_to,'mlp_model_params_z{}_mu{}.arg'.format(znum,num_centers)))
        arg_params = {}
        aux_params = {}
        for k, value in params_dict.items():
            arg_type, name = k.split(':', 1)
            if arg_type == 'arg':
                arg_params[name] = value
            elif arg_type == 'aux':
                aux_params[name] = value
            else:
                raise ValueError("Invalid param file ")

        # order of params: [(128L, 266L),(128L,),(32L, 128L),(32L,),(2L, 32L),(2L,)]
        # organize weights and biases
        l1=[v.asnumpy().shape for k,v in mlp_model_params.iteritems()]
        k1=[k for k,v in mlp_model_params.iteritems()]
        l2=[v.asnumpy().shape for k,v in arg_params.iteritems()]
        k2=[k for k,v in arg_params.iteritems()]

        for ikparam,sizeparam in enumerate(l1):
            for jkparam,savedparam in enumerate(l2):
                if(sizeparam == savedparam):
                    #print('updating layer parameters: {}'.format(savedparam))
                    mlp_model_params[k1[ikparam]] = arg_params[k2[jkparam]]
        # upddate model parameters
        mlp_model.set_params(mlp_model_params, aux_params)
        
        #####################
        # ROC: Z-space MLP fully coneected layer for classification
        #####################
        figROCs = plt.figure(figsize=(12,4))    
        # Run classifier with cross-validation and plot ROC curves
        cv = StratifiedKFold(n_splits=5,random_state=3)
        # Evaluate a score by cross-validation
        tprs_train = []; aucs_train = []
        tprs_val = []; aucs_val = []
        mean_fpr = np.linspace(0, 1, 100)
        cvi = 0
        for train, test in cv.split(Z_train, yZ_train):
            ############### on train
            MLP_train_iter = mx.io.NDArrayIter(Z_train[train], yZ_train[train], batch_size)  
            # prob[i][j] is the probability that the i-th validation contains the j-th output class.
            prob_train = mlp_model.predict(MLP_train_iter)
            # Compute ROC curve and area the curve
            fpr_train, tpr_train, thresholds_train = roc_curve(yZ_train[train], prob_train.asnumpy()[:,1])
            # to create an ROC with 100 pts
            tprs_train.append(interp(mean_fpr, fpr_train, tpr_train))
            tprs_train[-1][0] = 0.0
            roc_auc = auc(fpr_train, tpr_train)
            aucs_train.append(roc_auc)
            
            ############### on validation
            MLP_val_iter = mx.io.NDArrayIter(Z_train[test], yZ_train[test], batch_size)    
            # prob[i][j] is the probability that the i-th validation contains the j-th output class.
            prob_val = mlp_model.predict(MLP_val_iter)
            # Compute ROC curve and area the curve
            fpr_val, tpr_val, thresholds_val = roc_curve(yZ_train[test], prob_val.asnumpy()[:,1])
            # to create an ROC with 100 pts
            tprs_val.append(interp(mean_fpr, fpr_val, tpr_val))
            tprs_val[-1][0] = 0.0
            roc_auc = auc(fpr_val, tpr_val)
            aucs_val.append(roc_auc)
            # plot
            #axaroc.plot(fpr, tpr, lw=1, alpha=0.6) # with label add: label='cv %d, AUC %0.2f' % (cvi, roc_auc)
            cvi += 1
           
        # plot for cv Train
        axaroc_train = figROCs.add_subplot(1,3,1)
        # add 50% or chance line
        axaroc_train.plot([0, 1], [0, 1], linestyle='--', lw=1, color='b', alpha=.9)
        # plot mean and +- 1 -std as fill area
        mean_tpr_train = np.mean(tprs_train, axis=0)
        mean_tpr_train[-1] = 1.0
        mean_auc_train = auc(mean_fpr, mean_tpr_train)
        std_auc_train = np.std(aucs_train)
        axaroc_train.plot(mean_fpr, mean_tpr_train, color='b',
                    label=r'cv Train (AUC = %0.2f $\pm$ %0.2f)' % (mean_auc_train, std_auc_train),lw=3, alpha=1)     
        std_tpr = np.std(tprs_train, axis=0)
        tprs_upper = np.minimum(mean_tpr_train + std_tpr, 1)
        tprs_lower = np.maximum(mean_tpr_train - std_tpr, 0)
        axaroc_train.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2,label=r'$\pm$ 1 std. dev.') 
        # set labels
        axaroc_train.set_xlabel('False Positive Rate',fontsize=16)
        axaroc_train.set_ylabel('True Positive Rate',fontsize=16)
        axaroc_train.set_title('Unsupervised DEC + cv MLP classifier Zspace dim={}'.format(Z.shape[1]),fontsize=18)
        axaroc_train.legend(loc="lower right",fontsize=16)
        
        # plot for cv val
        axaroc_val = figROCs.add_subplot(1,3,2)
        # add 50% or chance line
        axaroc_val.plot([0, 1], [0, 1], linestyle='--', lw=1, color='b', alpha=.9)
        # plot mean and +- 1 -std as fill area
        mean_tpr_val = np.mean(tprs_val, axis=0)
        mean_tpr_val[-1] = 1.0
        mean_auc_val = auc(mean_fpr, mean_tpr_val)
        std_auc_val = np.std(aucs_val)
        axaroc_val.plot(mean_fpr, mean_tpr_val, color='g',
                    label=r'cv Val (AUC = %0.2f $\pm$ %0.2f)' % (mean_auc_val, std_auc_val),lw=3, alpha=1)     
        std_tpr = np.std(tprs_val, axis=0)
        tprs_upper = np.minimum(mean_tpr_val + std_tpr, 1)
        tprs_lower = np.maximum(mean_tpr_val - std_tpr, 0)
        axaroc_val.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2,label=r'$\pm$ 1 std. dev.') 
        # set labels
        axaroc_val.set_xlabel('False Positive Rate',fontsize=16)
        axaroc_val.set_ylabel('True Positive Rate',fontsize=16)
        #axaroc_val.set_title('Unsupervised DEC + cv MLP classifier Zspace dim={}'.format(Z.shape[1]),fontsize=14)
        axaroc_val.legend(loc="lower right",fontsize=16)
        
        ################
        # plot AUC on heldout set
        ################
        MLP_heldout_iter = mx.io.NDArrayIter(Z_test, None, batch_size)   
        probas_heldout = mlp_model.predict(MLP_heldout_iter)
           
        # plot for cv val
        axaroc_test = figROCs.add_subplot(1,3,3)
        # add 50% or chance line
        axaroc_test.plot([0, 1], [0, 1], linestyle='--', lw=1, color='b', alpha=.9)
        # Compute ROC curve and area the curve
        fpr_test, tpr_test, thresholds_test = roc_curve(yZ_test, probas_heldout.asnumpy()[:, 1])
        auc_test = auc(fpr_test, tpr_test)
        axaroc_test.plot(fpr_test, tpr_test, color='r',
                    label=r'Test (AUC = %0.2f)' % (auc_test),lw=3, alpha=1)     
        # set labels            
        axaroc_test.set_xlabel('False Positive Rate',fontsize=16)
        axaroc_test.set_ylabel('True Positive Rate',fontsize=16)
        #axaroc.set_title('ROC LD DEC optimized space={}, all features={} - Unsupervised DEC + cv MLP classifier'.format(Z.shape[0],Z.shape[1]),fontsize=18)
        axaroc_test.legend(loc="lower right",fontsize=16)
        plt.show()
    
        ############# append to 
        cvorigXAUC.append(0.69)
        cvZspaceAUC_cvtrain.append(mean_auc_train)
        cvZspace_stdAUC_cvtrain.append(std_auc_train)
        cvZspaceAUC_cvVal.append(mean_auc_val)
        cvZspace_stdAUC_cvVal.append(std_auc_val)
        TestAUC.append(auc_test)
        
        ############# append to 
        to_plotcvOrigXAUC.append(0.69)
        to_plotinitAUC.append(dec_model['meanAuc_cv'][0])
        to_plotcvZspaceAUC_cvtrain.append(mean_auc_train)
        to_plotcvZspaceAUC_cvVal.append(mean_auc_val)
        to_plotTestAUC.append(auc_test)
        
        scoresM[ik,ic,0] = mean_auc_train
        scoresM_titles.append("DEC cv mean_auc_train")
        scoresM[ik,ic,1] = std_auc_train
        scoresM_titles.append("DEC cv std_auc_train")    
        scoresM[ik,ic,2] = mean_auc_val
        scoresM_titles.append("DEC cv mean_auc_val")
        scoresM[ik,ic,3] = std_auc_val
        scoresM_titles.append("DEC cv std_auc_val")       
        scoresM[ik,ic,4] = auc_test
        scoresM_titles.append("DEC heal-out test AUC")          
        
    # plot latent space Accuracies vs. original
    colors = plt.cm.jet(np.linspace(0, 1, 16))
    fig2 = plt.figure(figsize=(12,6))
    #ax2 = plt.axes()
    sns.set_context("notebook")
    ax1 = fig2.add_subplot(2,1,1)
    ax1.plot(varying_mu, to_plotcvZspaceAUC_cvtrain, color=colors[0], ls=':', label="DEC+MLP cv Train")
    ax1.plot(varying_mu, to_plotcvZspaceAUC_cvVal, color=colors[2], ls=':', label="DEC+MLP cv Validation")
    ax1.plot(varying_mu, to_plotTestAUC, color=colors[8], ls='--', label="DEC+MLP held-out test")
    ax1.plot(varying_mu, to_plotcvOrigXAUC, color=colors[6], label='HD space MLP held-out test')
    ax1.set_title("Performance AUC for x{} times dimentionality reduction".format(input_size/znum))
    ax1.set_xlabel("num clusters")
    ax1.set_ylabel("AUC")
    h1, l1 = ax1.get_legend_handles_labels()
    ax1.legend(h1, l1, loc='center left', bbox_to_anchor=(1, 0.5), prop={'size':16})
    
    print("summary stats at x{} times dimentionality reduction".format(input_size/znum))
    print("mean cvRFZspaceAUC_cvtrain ={}".format(np.mean(to_plotcvZspaceAUC_cvtrain)))
    print("std cvRFZspaceAUC_cvtrain ={}".format(np.std(to_plotcvZspaceAUC_cvtrain)))
    print("mean cvRFZspaceAUC_cvVal ={}".format(np.mean(to_plotcvZspaceAUC_cvVal)))
    print("std cvRFZspaceAUC_cvVal ={}".format(np.std(to_plotcvZspaceAUC_cvVal)))
    print("mean TestAUC ={}".format(np.mean(to_plotTestAUC)))
    print("std TesAUC ={}".format(np.std(to_plotTestAUC)))
    
Loading autoencoder of znum = 34, mu = 3 , post training DEC results
DEC train init AUC = 0.6416295306
DEC train max meanAuc_cv = 0.663250148544 $\pm$ 0.0975712138706
DEC validation AUC at max meanAuc_cv = 0.726206896552

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 34, mu = 4 , post training DEC results
DEC train init AUC = 0.651448306595
DEC train max meanAuc_cv = 0.675742721331 $\pm$ 0.113889179373
DEC validation AUC at max meanAuc_cv = 0.720689655172

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 34, mu = 5 , post training DEC results
DEC train init AUC = 0.629471182412
DEC train max meanAuc_cv = 0.719503862151 $\pm$ 0.0641157672004
DEC validation AUC at max meanAuc_cv = 0.727586206897

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 34, mu = 6 , post training DEC results
DEC train init AUC = 0.639624183007
DEC train max meanAuc_cv = 0.695187165775 $\pm$ 0.111091464145
DEC validation AUC at max meanAuc_cv = 0.706206896552

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 34, mu = 7 , post training DEC results
DEC train init AUC = 0.633682412359
DEC train max meanAuc_cv = 0.701663695781 $\pm$ 0.111138303625
DEC validation AUC at max meanAuc_cv = 0.738620689655

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 34, mu = 8 , post training DEC results
DEC train init AUC = 0.627450980392
DEC train max meanAuc_cv = 0.700081699346 $\pm$ 0.107304163774
DEC validation AUC at max meanAuc_cv = 0.739310344828

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 34, mu = 9 , post training DEC results
DEC train init AUC = 0.625757575758
DEC train max meanAuc_cv = 0.719273618538 $\pm$ 0.095940320071
DEC validation AUC at max meanAuc_cv = 0.76275862069

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 34, mu = 10 , post training DEC results
DEC train init AUC = 0.634826203209
DEC train max meanAuc_cv = 0.697898098633 $\pm$ 0.121962937207
DEC validation AUC at max meanAuc_cv = 0.743448275862

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 34, mu = 11 , post training DEC results
DEC train init AUC = 0.599903446227
DEC train max meanAuc_cv = 0.699242424242 $\pm$ 0.0918815625092
DEC validation AUC at max meanAuc_cv = 0.661379310345

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 34, mu = 12 , post training DEC results
DEC train init AUC = 0.628810160428
DEC train max meanAuc_cv = 0.706409685086 $\pm$ 0.103465332661
DEC validation AUC at max meanAuc_cv = 0.717931034483

... MLP fully coneected layer trained on Z_train tested on Z_test
summary stats at x15 times dimentionality reduction
mean cvRFZspaceAUC_cvtrain =0.777752235349
std cvRFZspaceAUC_cvtrain =0.0204489257316
mean cvRFZspaceAUC_cvVal =0.779038918598
std cvRFZspaceAUC_cvVal =0.0188697254735
mean TestAUC =0.722620689655
std TesAUC =0.0263296580266
Loading autoencoder of znum = 52, mu = 3 , post training DEC results
DEC train init AUC = 0.607546048723
DEC train max meanAuc_cv = 0.671732026144 $\pm$ 0.120839844644
DEC validation AUC at max meanAuc_cv = 0.591724137931

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 52, mu = 4 , post training DEC results
DEC train init AUC = 0.612351455734
DEC train max meanAuc_cv = 0.642847593583 $\pm$ 0.118448736104
DEC validation AUC at max meanAuc_cv = 0.715172413793

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 52, mu = 5 , post training DEC results
DEC train init AUC = 0.604270647653
DEC train max meanAuc_cv = 0.689208259061 $\pm$ 0.104287918419
DEC validation AUC at max meanAuc_cv = 0.706206896552

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 52, mu = 6 , post training DEC results
DEC train init AUC = 0.609685086156
DEC train max meanAuc_cv = 0.690441176471 $\pm$ 0.0968092559628
DEC validation AUC at max meanAuc_cv = 0.653793103448

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 52, mu = 7 , post training DEC results
DEC train init AUC = 0.580481283422
DEC train max meanAuc_cv = 0.690292632204 $\pm$ 0.102921854986
DEC validation AUC at max meanAuc_cv = 0.695172413793

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 52, mu = 8 , post training DEC results
DEC train init AUC = 0.62385620915
DEC train max meanAuc_cv = 0.689661319073 $\pm$ 0.0950056095324
DEC validation AUC at max meanAuc_cv = 0.733793103448

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 52, mu = 9 , post training DEC results
DEC train init AUC = 0.632976827094
DEC train max meanAuc_cv = 0.687707961973 $\pm$ 0.114863560775
DEC validation AUC at max meanAuc_cv = 0.675862068966

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 52, mu = 10 , post training DEC results
DEC train init AUC = 0.592966428996
DEC train max meanAuc_cv = 0.686289364231 $\pm$ 0.0953059823717
DEC validation AUC at max meanAuc_cv = 0.74

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 52, mu = 11 , post training DEC results
DEC train init AUC = 0.603342245989
DEC train max meanAuc_cv = 0.70116607249 $\pm$ 0.108770600015
DEC validation AUC at max meanAuc_cv = 0.648965517241

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 52, mu = 12 , post training DEC results
DEC train init AUC = 0.585732323232
DEC train max meanAuc_cv = 0.696910279263 $\pm$ 0.0938364555179
DEC validation AUC at max meanAuc_cv = 0.698620689655

... MLP fully coneected layer trained on Z_train tested on Z_test
summary stats at x10 times dimentionality reduction
mean cvRFZspaceAUC_cvtrain =0.781727272727
std cvRFZspaceAUC_cvtrain =0.00999162594376
mean cvRFZspaceAUC_cvVal =0.780394385027
std cvRFZspaceAUC_cvVal =0.00875430925696
mean TestAUC =0.699310344828
std TesAUC =0.0324401946769
Loading autoencoder of znum = 104, mu = 3 , post training DEC results
DEC train init AUC = 0.652257872846
DEC train max meanAuc_cv = 0.689579619727 $\pm$ 0.101721841667
DEC validation AUC at max meanAuc_cv = 0.729655172414

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 104, mu = 4 , post training DEC results
DEC train init AUC = 0.628461081402
DEC train max meanAuc_cv = 0.664423648247 $\pm$ 0.0870075352121
DEC validation AUC at max meanAuc_cv = 0.698620689655

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 104, mu = 5 , post training DEC results
DEC train init AUC = 0.655622400475
DEC train max meanAuc_cv = 0.711319073084 $\pm$ 0.102277305304
DEC validation AUC at max meanAuc_cv = 0.711724137931

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 104, mu = 6 , post training DEC results
DEC train init AUC = 0.621821152704
DEC train max meanAuc_cv = 0.71977124183 $\pm$ 0.0911736437
DEC validation AUC at max meanAuc_cv = 0.739310344828

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 104, mu = 7 , post training DEC results
DEC train init AUC = 0.613502673797
DEC train max meanAuc_cv = 0.716347296494 $\pm$ 0.107785233397
DEC validation AUC at max meanAuc_cv = 0.717931034483

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 104, mu = 8 , post training DEC results
DEC train init AUC = 0.667810457516
DEC train max meanAuc_cv = 0.717446524064 $\pm$ 0.0927265480777
DEC validation AUC at max meanAuc_cv = 0.725517241379

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 104, mu = 9 , post training DEC results
DEC train init AUC = 0.632508912656
DEC train max meanAuc_cv = 0.716941473559 $\pm$ 0.102839241571
DEC validation AUC at max meanAuc_cv = 0.708965517241

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 104, mu = 10 , post training DEC results
DEC train init AUC = 0.668820558526
DEC train max meanAuc_cv = 0.716154188948 $\pm$ 0.0909750155404
DEC validation AUC at max meanAuc_cv = 0.753793103448

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 104, mu = 11 , post training DEC results
DEC train init AUC = 0.65713755199
DEC train max meanAuc_cv = 0.722704991087 $\pm$ 0.088124678953
DEC validation AUC at max meanAuc_cv = 0.688965517241

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 104, mu = 12 , post training DEC results
DEC train init AUC = 0.616451277481
DEC train max meanAuc_cv = 0.715886809269 $\pm$ 0.104953199122
DEC validation AUC at max meanAuc_cv = 0.716551724138

... MLP fully coneected layer trained on Z_train tested on Z_test
summary stats at x5 times dimentionality reduction
mean cvRFZspaceAUC_cvtrain =0.773968240093
std cvRFZspaceAUC_cvtrain =0.0108469884413
mean cvRFZspaceAUC_cvVal =0.775352049911
std cvRFZspaceAUC_cvVal =0.00948099543844
mean TestAUC =0.718827586207
std TesAUC =0.0168847249249
Loading autoencoder of znum = 261, mu = 3 , post training DEC results
DEC train init AUC = 0.673262032086
DEC train max meanAuc_cv = 0.732642602496 $\pm$ 0.0931672775493
DEC validation AUC at max meanAuc_cv = 0.775172413793

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 261, mu = 4 , post training DEC results
DEC train init AUC = 0.677963458111
DEC train max meanAuc_cv = 0.731417112299 $\pm$ 0.0895483390756
DEC validation AUC at max meanAuc_cv = 0.740689655172

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 261, mu = 5 , post training DEC results
DEC train init AUC = 0.685658051099
DEC train max meanAuc_cv = 0.734061200238 $\pm$ 0.0829846283581
DEC validation AUC at max meanAuc_cv = 0.702068965517

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 261, mu = 6 , post training DEC results
DEC train init AUC = 0.687923351159
DEC train max meanAuc_cv = 0.72506684492 $\pm$ 0.0744661351573
DEC validation AUC at max meanAuc_cv = 0.74

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 261, mu = 7 , post training DEC results
DEC train init AUC = 0.663049613785
DEC train max meanAuc_cv = 0.723499702911 $\pm$ 0.0858269809613
DEC validation AUC at max meanAuc_cv = 0.766896551724

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 261, mu = 8 , post training DEC results
DEC train init AUC = 0.695291146762
DEC train max meanAuc_cv = 0.7311942959 $\pm$ 0.0933493696945
DEC validation AUC at max meanAuc_cv = 0.764137931034

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 261, mu = 9 , post training DEC results
DEC train init AUC = 0.711415626857
DEC train max meanAuc_cv = 0.703535353535 $\pm$ 0.0821212564027
DEC validation AUC at max meanAuc_cv = 0.737931034483

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 261, mu = 10 , post training DEC results
DEC train init AUC = 0.641837492573
DEC train max meanAuc_cv = 0.702792632204 $\pm$ 0.0851479319258
DEC validation AUC at max meanAuc_cv = 0.752413793103

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 261, mu = 11 , post training DEC results
DEC train init AUC = 0.662967914439
DEC train max meanAuc_cv = 0.728490790255 $\pm$ 0.0975338789362
DEC validation AUC at max meanAuc_cv = 0.751724137931

... MLP fully coneected layer trained on Z_train tested on Z_test
Loading autoencoder of znum = 261, mu = 12 , post training DEC results
DEC train init AUC = 0.642557932264
DEC train max meanAuc_cv = 0.723247177659 $\pm$ 0.0840747895324
DEC validation AUC at max meanAuc_cv = 0.747586206897

... MLP fully coneected layer trained on Z_train tested on Z_test
summary stats at x2 times dimentionality reduction
mean cvRFZspaceAUC_cvtrain =0.780102758353
std cvRFZspaceAUC_cvtrain =0.0134479590724
mean cvRFZspaceAUC_cvVal =0.786251485443
std cvRFZspaceAUC_cvVal =0.0133043336166
mean TestAUC =0.743517241379
std TesAUC =0.0225242899672
In [7]:
################
# to hold the hyperparams
grd = np.array([(zs, mus) for zs in latent_size for mus in varying_mu])
grdperf_DEC = pd.DataFrame(grd)
grdperf_DEC.columns = ["Zsize","n_mu"]

# finish fromating DEC performance
grdperf_DEC['cvorigXAUC'] = cvorigXAUC
grdperf_DEC['cvZspaceAUC_cvtrain'] = cvZspaceAUC_cvtrain
grdperf_DEC['cvZspace_stdAUC_cvtrain'] = cvZspace_stdAUC_cvtrain
grdperf_DEC['cvZspaceAUC_cvVal'] = cvZspaceAUC_cvVal
grdperf_DEC['cvZspace_stdAUC_cvVal'] = cvZspace_stdAUC_cvVal
grdperf_DEC['TestAUC'] = TestAUC
print(grdperf_DEC)

# save pooled model probabilties
grdperf_DEC.to_csv('datasets/grdperf_DEC.csv', header=True, index=False)
    Zsize  n_mu  cvorigXAUC  cvZspaceAUC_cvtrain  cvZspace_stdAUC_cvtrain  \
0      34     3        0.69             0.773958                 0.037140   
1      34     4        0.69             0.761415                 0.020337   
2      34     5        0.69             0.746370                 0.017510   
3      34     6        0.69             0.816598                 0.029340   
4      34     7        0.69             0.781169                 0.028171   
5      34     8        0.69             0.788786                 0.023948   
6      34     9        0.69             0.773639                 0.024514   
7      34    10        0.69             0.779825                 0.034705   
8      34    11        0.69             0.752863                 0.024552   
9      34    12        0.69             0.802899                 0.031373   
10     52     3        0.69             0.793885                 0.029320   
11     52     4        0.69             0.761184                 0.030441   
12     52     5        0.69             0.798863                 0.032366   
13     52     6        0.69             0.776605                 0.031390   
14     52     7        0.69             0.783213                 0.031257   
15     52     8        0.69             0.773074                 0.026327   
16     52     9        0.69             0.786537                 0.034833   
17     52    10        0.69             0.779046                 0.029217   
18     52    11        0.69             0.782088                 0.033778   
19     52    12        0.69             0.782778                 0.025902   
20    104     3        0.69             0.782138                 0.029196   
21    104     4        0.69             0.787349                 0.027821   
22    104     5        0.69             0.759464                 0.029295   
23    104     6        0.69             0.783653                 0.030064   
24    104     7        0.69             0.771752                 0.026392   
25    104     8        0.69             0.784693                 0.025163   
26    104     9        0.69             0.766660                 0.029268   
27    104    10        0.69             0.778728                 0.027692   
28    104    11        0.69             0.771929                 0.027321   
29    104    12        0.69             0.753317                 0.028648   
30    261     3        0.69             0.801616                 0.025357   
31    261     4        0.69             0.778517                 0.023590   
32    261     5        0.69             0.758591                 0.022881   
33    261     6        0.69             0.761565                 0.020963   
34    261     7        0.69             0.794541                 0.023317   
35    261     8        0.69             0.796605                 0.025414   
36    261     9        0.69             0.776311                 0.022913   
37    261    10        0.69             0.773957                 0.028239   
38    261    11        0.69             0.780966                 0.026257   
39    261    12        0.69             0.778359                 0.023100   

    cvZspaceAUC_cvVal  cvZspace_stdAUC_cvVal   TestAUC  
0            0.778810               0.138929  0.704828  
1            0.764097               0.102937  0.724138  
2            0.749443               0.083445  0.727586  
3            0.812389               0.117503  0.706207  
4            0.775728               0.117648  0.739310  
5            0.786646               0.107298  0.739310  
6            0.775765               0.102445  0.762069  
7            0.785071               0.134787  0.743448  
8            0.756484               0.104694  0.661379  
9            0.805957               0.116176  0.717931  
10           0.790515               0.118769  0.748966  
11           0.768486               0.120526  0.692414  
12           0.799265               0.126437  0.706207  
13           0.777236               0.124490  0.653793  
14           0.782479               0.127695  0.693793  
15           0.769058               0.106901  0.733793  
16           0.778305               0.137170  0.675862  
17           0.777213               0.118771  0.740000  
18           0.778624               0.135374  0.649655  
19           0.782761               0.106860  0.698621  
20           0.778788               0.120331  0.708276  
21           0.784403               0.109095  0.722069  
22           0.762953               0.120076  0.711724  
23           0.783727               0.118113  0.738621  
24           0.770224               0.106663  0.713793  
25           0.786453               0.106943  0.725517  
26           0.769392               0.118618  0.708966  
27           0.785740               0.109371  0.753793  
28           0.773351               0.114245  0.688966  
29           0.758489               0.113960  0.716552  
30           0.811988               0.104961  0.775172  
31           0.786579               0.096958  0.740690  
32           0.767216               0.098176  0.702759  
33           0.769192               0.088306  0.740000  
34           0.799926               0.094899  0.766897  
35           0.799161               0.101838  0.764138  
36           0.781417               0.099459  0.737931  
37           0.777778               0.115464  0.752414  
38           0.784343               0.108141  0.707586  
39           0.784915               0.096415  0.747586  
In [8]:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import MultipleLocator, FixedLocator, FormatStrFormatter
import matplotlib.cm

figscoresM, axes = plt.subplots(nrows=5, ncols=1, figsize=(14, 24)) 
for k,ax in enumerate(axes.flat):
    im = ax.imshow(scoresM[:,:,k], cmap='BuPu_r', interpolation='nearest')
    ax.grid(False)
    for u in range(len(latent_size)):        
        for v in range(len(varying_mu)):
            ax.text(v,u,'{:.2f}'.format(scoresM[u,v,k]), color=np.array([0.05,0.15,0.15,1]),
                         fontdict={'weight': 'bold', 'size': 14})
    # set ticks
    ax.xaxis.set_major_locator(FixedLocator(np.linspace(0,9,10)))
    ax.xaxis.set_major_formatter(FormatStrFormatter('%d'))
    ax.xaxis.set_minor_locator(MultipleLocator(1))
    
    mu_labels = [str(mu) for mu in varying_mu]
    ax.set_xticklabels(mu_labels, minor=False,fontsize=16)
    ax.yaxis.set_major_locator(FixedLocator(np.linspace(0,3,4)))
    ax.yaxis.set_major_formatter(FormatStrFormatter('%d'))
    ax.yaxis.set_minor_locator(MultipleLocator(2))
    
    znum_labels = ['15x','10x','5x','2x'] #[str(znum) for znum in latent_size]
    ax.set_yticklabels(znum_labels, minor=False,fontsize=16)
    ax.xaxis.set_label('latent space reduction')
    ax.yaxis.set_label('# cluster centroids')
    ax.set_title(scoresM_titles[k],fontsize=16)
#¶

Find best performing parameters¶

#¶
In [9]:
input_size = combX_allNME.shape[1]
print "original input space = %d" % input_size 
dict_aucZlatent = pd.DataFrame() 
for k,znum in enumerate(latent_size):
    for l,num_c in enumerate(varying_mu):
        dict_aucZlatent = dict_aucZlatent.append( pd.Series({'Zspacedim':znum, 
                                                             'Zspace_AUC_ROC': scoresM[k,l,2], 
                                                              'Zspace_test_AUC_ROC': scoresM[k,l,4], 
                                                             'num_clusters':num_c}), ignore_index=True)
fig2 = plt.figure(figsize=(20,10))
ax2 = plt.axes()
sns.set_context("notebook")  
sns.pointplot(x="num_clusters", y="Zspace_AUC_ROC", hue="Zspacedim", data=dict_aucZlatent, ax=ax2, size=0.05) 
sns.pointplot(x="num_clusters", y="Zspace_test_AUC_ROC", hue="Zspacedim", data=dict_aucZlatent, ax=ax2, size=0.05, markers=["x","x","x","x"],linestyles=["--","--","--","--"]) 
ax2.xaxis.set_label('# clusters')
ax2.yaxis.set_label('Zspace AUC ROC')
ax2.set_title('Zspace AUC ROC vs. number of clusters',fontsize=20)
original input space = 523
Out[9]:
<matplotlib.text.Text at 0x5f5bea90>
In [10]:
# plot box plots
dict_aucZlatent['Zspacedim_cats'] = pd.Series(dict_aucZlatent['Zspacedim'], dtype="category")
dict_aucZlatent['num_clusters_cats'] = pd.Series(dict_aucZlatent['num_clusters'], dtype="category")
In [11]:
sns.set_color_codes("pastel")
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(2,2,1)
sns.boxplot(y="Zspace_AUC_ROC", x="Zspacedim_cats", data=dict_aucZlatent, ax=ax1)
znum_labels = ['15x','10x','5x','2x'] 
ax1.set_xticklabels(znum_labels, minor=False,fontsize=12)
ax1.set_xlabel('Latent space reduction')
ax1.set_ylabel('cv Train AUC ROC across # of centroids')

ax2 = fig.add_subplot(2,2,3)
sns.boxplot(y="Zspace_AUC_ROC", x="num_clusters_cats", data=dict_aucZlatent, ax=ax2)
ax2.set_xlabel('# cluster centroids')
ax2.set_ylabel('cv Train AUC ROC across reduction ratios')

ax3 = fig.add_subplot(2,2,2)
sns.boxplot(y="Zspace_test_AUC_ROC", x="Zspacedim_cats", data=dict_aucZlatent, ax=ax3)
ax3.set_xticklabels(znum_labels, minor=False,fontsize=12)
ax3.set_xlabel('Latent space reduction')
ax3.set_ylabel('Test AUC ROC across # of centroids')

ax4 = fig.add_subplot(2,2,4)
sns.boxplot(y="Zspace_test_AUC_ROC", x="num_clusters_cats", data=dict_aucZlatent, ax=ax4)
ax4.set_xlabel('# cluster centroids')
ax4.set_ylabel('Test AUC ROC across reduction ratios')
Out[11]:
<matplotlib.text.Text at 0x68673c18>
In [12]:
# find best performing by the average of broth train and test performance
max_aucZlatent = np.max(dict_aucZlatent[["Zspace_AUC_ROC", "Zspace_test_AUC_ROC"]].mean(axis=1))
indmax_meanAuc_cv = dict_aucZlatent[["Zspace_AUC_ROC", "Zspace_test_AUC_ROC"]].mean(axis=1) == max_aucZlatent
print "\n================== Best average train/test perfomance parameters:" 
bestperf_params = dict_aucZlatent[indmax_meanAuc_cv]
print bestperf_params
================== Best average train/test perfomance parameters:
    Zspace_AUC_ROC  Zspace_test_AUC_ROC  Zspacedim  num_clusters  \
30        0.811988             0.775172        261             3   

   Zspacedim_cats num_clusters_cats  
30            261                 3  
In [13]:
for u, znum in enumerate(latent_size):
    print('=============znum = {} , post training DEC results'.format(znum))
    max_test_val = max(scoresM[u,:,2])
    print "max auc at knum centroids", max_test_val
    iknum = [i for i, aucj in enumerate(scoresM[u,:,2]) if aucj == max_test_val]
    print varying_mu[iknum[0]]
    print "auc train cv, at knum centroids", scoresM[u,iknum[0],0]
    print "std auc train cv, at knum centroids", scoresM[u,iknum[0],1]
    print "auc val cv, at knum centroids", scoresM[u,iknum[0],2]
    print "std of max auc val cv, at knum centroids", scoresM[u,iknum[0],3]
    print "max acu heal-out test cv, at knum centroids", scoresM[u,iknum[0],4]
=============znum = 34 , post training DEC results
max auc at knum centroids 0.8123885918
6
auc train cv, at knum centroids 0.816598193473
std auc train cv, at knum centroids 0.0293400423638
auc val cv, at knum centroids 0.8123885918
std of max auc val cv, at knum centroids 0.117503113719
max acu heal-out test cv, at knum centroids 0.706206896552
=============znum = 52 , post training DEC results
max auc at knum centroids 0.799264705882
5
auc train cv, at knum centroids 0.798862665113
std auc train cv, at knum centroids 0.0323659851994
auc val cv, at knum centroids 0.799264705882
std of max auc val cv, at knum centroids 0.126436678815
max acu heal-out test cv, at knum centroids 0.706206896552
=============znum = 104 , post training DEC results
max auc at knum centroids 0.786452762923
8
auc train cv, at knum centroids 0.784692599068
std auc train cv, at knum centroids 0.0251632431944
auc val cv, at knum centroids 0.786452762923
std of max auc val cv, at knum centroids 0.106943392903
max acu heal-out test cv, at knum centroids 0.725517241379
=============znum = 261 , post training DEC results
max auc at knum centroids 0.811987522282
3
auc train cv, at knum centroids 0.801615675991
std auc train cv, at knum centroids 0.0253569164686
auc val cv, at knum centroids 0.811987522282
std of max auc val cv, at knum centroids 0.104961283321
max acu heal-out test cv, at knum centroids 0.775172413793
In [14]:
######################
# Combined ROCs
######################  
sns.set_color_codes("pastel")
## to append pooled predictions
pooled_pred_train = pd.DataFrame()
pooled_pred_val = pd.DataFrame()

for u, znum in enumerate(latent_size):
    print('znum = {} , post training DEC results'.format(znum))
    max_test_val = max(scoresM[u,:,4])
    print "max auc heal-out test cv, at knum centroids", max_test_val
    iknum = [i for i, aucj in enumerate(scoresM[u,:,4]) if aucj == max_test_val]
    print varying_mu[iknum[0]]
    print "auc train cv, at knum centroids", scoresM[u,i,0]
    print "std auc train cv, at knum centroids", scoresM[u,i,1]
    print "auc val cv, at knum centroids", scoresM[u,i,2]
    print "std of max auc val cv, at knum centroids", scoresM[u,i,3]

    num_centers =  varying_mu[iknum[0]]
    X = combX_allNME
    y = roi_labels
    y_train_roi_labels = np.asarray(y)

    print('Loading autoencoder of znum = {}, mu = {} , post training DEC results'.format(znum,num_centers))
    dec_model = DECModel(mx.cpu(), X, num_centers, 1.0, znum, 'Z:\\Cristina\\Section3\\paper_notes_section3_MODIFIED\\save_to\\SAEmodels') 

    with gzip.open(os.path.join(save_to,'dec_model_z{}_mu{}_{}.arg'.format(znum,num_centers,labeltype)), 'rb') as fu:
        dec_model = pickle.load(fu)
      
    with gzip.open(os.path.join(save_to,'outdict_z{}_mu{}_{}.arg'.format(znum,num_centers,labeltype)), 'rb') as fu:
        outdict = pickle.load(fu)
        
    print('DEC train init AUC = {}'.format(outdict['meanAuc_cv'][0]))
    max_meanAuc_cv = outdict['meanAuc_cv'][-1]
    indmax_meanAuc_cv = outdict['meanAuc_cv'].index(max_meanAuc_cv)
    print r'DEC train max meanAuc_cv = {} $\pm$ {}'.format(max_meanAuc_cv,dec_model['std_auc'][indmax_meanAuc_cv])
    print('DEC validation AUC at max meanAuc_cv = {}'.format(outdict['auc_val'][indmax_meanAuc_cv]))
    
    #####################
    # extract Z-space from optimal DEC model
    #####################
    # saved output results
    dec_args_keys = ['encoder_1_bias', 'encoder_3_weight', 'encoder_0_weight', 
    'encoder_0_bias', 'encoder_2_weight', 'encoder_1_weight', 
    'encoder_3_bias', 'encoder_2_bias']
    dec_args = {key: v for key, v in dec_model.items() if key in dec_args_keys}
    dec_args['dec_mubestacci'] = dec_model['dec_mu']
    
    N = X.shape[0]
    all_iter = mx.io.NDArrayIter({'data': X}, batch_size=X.shape[0], shuffle=False,
                                              last_batch_handle='pad')   
    ## extract embedded point zi 
    mxdec_args = {key: mx.nd.array(v) for key, v in dec_args.items() if key != 'dec_mubestacci'}                           
    aDEC = DECModel(mx.cpu(), X, num_centers, 1.0, znum, 'Z:\\Cristina\\Section3\\paper_notes_section3_MODIFIED\\save_to\\SAEmodels') 
    
    # organize weights and biases
    l1=[v.asnumpy().shape for k,v in aDEC.ae_model.args.iteritems()]
    k1=[k for k,v in aDEC.ae_model.args.iteritems()]
    l2=[v.asnumpy().shape for k,v in mxdec_args.iteritems()]
    k2=[k for k,v in mxdec_args.iteritems()]

    for ikparam,sizeparam in enumerate(l1):
        for jkparam,savedparam in enumerate(l2):
            if(sizeparam == savedparam):
                #print('updating layer parameters: {}'.format(savedparam))
                aDEC.ae_model.args[k1[ikparam]] = mxdec_args[k2[jkparam]]

    zbestacci = model.extract_feature(aDEC.feature, mxdec_args, None, all_iter, X.shape[0], aDEC.xpu).values()[0]      

    # compute model-based best-pbestacci or dec_model['pbestacci']
    pbestacci = np.zeros((zbestacci.shape[0], dec_model['num_centers']))
    aDEC.dec_op.forward([zbestacci, dec_args['dec_mubestacci'].asnumpy()], [pbestacci])
    #pbestacci = dec_model['pbestacci']
    
    # pool Z-space variables
    datalabels = np.asarray(y)
    dataZspace = np.concatenate((zbestacci, pbestacci), axis=1) 

    #####################
    # unbiased assessment: SPlit train/held-out test
    #####################
    # to compare performance need to discard unkown labels, only use known labels (only B or M)
    Z = dataZspace[datalabels!='K',:]
    y = datalabels[datalabels!='K']
  
    print '\n... MLP fully coneected layer trained on Z_train tested on Z_test' 
    sep = int(X.shape[0]*0.10)
    Z_test = Z[:sep]
    yZ_test = np.asanyarray(y[:sep]=='M').astype(int) 
    Z_train = Z[sep:]
    yZ_train = np.asanyarray(y[sep:]=='M').astype(int) 
   
    # We’ll load MLP using MXNet’s symbolic interface
    dataMLP = mx.sym.Variable('data')
    # MLP: two fully connected layers with 128 and 32 neurons each. 
    fc1  = mx.sym.FullyConnected(data=dataMLP, num_hidden = 128)
    act1 = mx.sym.Activation(data=fc1, act_type="relu")
    fc2  = mx.sym.FullyConnected(data=act1, num_hidden = 32)
    act2 = mx.sym.Activation(data=fc2, act_type="relu")
    # data has 2 classes
    fc3  = mx.sym.FullyConnected(data=act2, num_hidden=2)
    # Softmax output layer
    mlp  = mx.sym.SoftmaxOutput(data=fc3, name='softmax')
    # create a trainable module on CPU     
    batch_size = 50
    mlp_model = mx.mod.Module(symbol=mlp, context=mx.cpu())
    # pass train/test data to allocate model (bind state)
    MLP_train_iter = mx.io.NDArrayIter(Z_train, yZ_train, batch_size, shuffle=False)
    mlp_model.bind(MLP_train_iter.provide_data, MLP_train_iter.provide_label)
    mlp_model.init_params()   
    mlp_model.init_optimizer()
    mlp_model_params = mlp_model.get_params()[0]
    
    # update parameters based on optimal found during cv Training
    from mxnet import ndarray
    params_dict = ndarray.load(os.path.join(save_to,'mlp_model_params_z{}_mu{}.arg'.format(znum,num_centers)))
    arg_params = {}
    aux_params = {}
    for k, value in params_dict.items():
        arg_type, name = k.split(':', 1)
        if arg_type == 'arg':
            arg_params[name] = value
        elif arg_type == 'aux':
            aux_params[name] = value
        else:
            raise ValueError("Invalid param file ")

    # order of params: [(128L, 266L),(128L,),(32L, 128L),(32L,),(2L, 32L),(2L,)]
    # organize weights and biases
    l1=[v.asnumpy().shape for k,v in mlp_model_params.iteritems()]
    k1=[k for k,v in mlp_model_params.iteritems()]
    l2=[v.asnumpy().shape for k,v in arg_params.iteritems()]
    k2=[k for k,v in arg_params.iteritems()]

    for ikparam,sizeparam in enumerate(l1):
        for jkparam,savedparam in enumerate(l2):
            if(sizeparam == savedparam):
                #print('updating layer parameters: {}'.format(savedparam))
                mlp_model_params[k1[ikparam]] = arg_params[k2[jkparam]]
    # upddate model parameters
    mlp_model.set_params(mlp_model_params, aux_params)
    
    #####################
    # ROC: Z-space MLP fully coneected layer for classification
    ####################
    # Run classifier with cross-validation and plot ROC curves
    cv = StratifiedKFold(n_splits=5,random_state=3)
    # Evaluate a score by cross-validation
    tprs_train = []; aucs_train = []
    tprs_val = []; aucs_val = []
    mean_fpr = np.linspace(0, 1, 100)
    cvi = 0
    for train, test in cv.split(Z_train, yZ_train):
        ############### on train
        MLP_train_iter = mx.io.NDArrayIter(Z_train[train], yZ_train[train], batch_size)  
        # prob[i][j] is the probability that the i-th validation contains the j-th output class.
        prob_train = mlp_model.predict(MLP_train_iter)
        # Compute ROC curve and area the curve
        fpr_train, tpr_train, thresholds_train = roc_curve(yZ_train[train], prob_train.asnumpy()[:,1])
        # to create an ROC with 100 pts
        tprs_train.append(interp(mean_fpr, fpr_train, tpr_train))
        tprs_train[-1][0] = 0.0
        roc_auc = auc(fpr_train, tpr_train)
        aucs_train.append(roc_auc)
        
        ############### on validation
        MLP_val_iter = mx.io.NDArrayIter(Z_train[test], yZ_train[test], batch_size)    
        # prob[i][j] is the probability that the i-th validation contains the j-th output class.
        prob_val = mlp_model.predict(MLP_val_iter)
        # Compute ROC curve and area the curve
        fpr_val, tpr_val, thresholds_val = roc_curve(yZ_train[test], prob_val.asnumpy()[:,1])
        # to create an ROC with 100 pts
        tprs_val.append(interp(mean_fpr, fpr_val, tpr_val))
        tprs_val[-1][0] = 0.0
        roc_auc = auc(fpr_val, tpr_val)
        aucs_val.append(roc_auc)
        # plot
        #axaroc.plot(fpr, tpr, lw=1, alpha=0.6) # with label add: label='cv %d, AUC %0.2f' % (cvi, roc_auc)
        cvi += 1
        ## appends
        if(u==3):
            pooled_pred_train = pooled_pred_train.append( pd.DataFrame({"labels":yZ_train[train],
                                  "probC":prob_train.asnumpy()[:,1],
                                  "probNC":prob_train.asnumpy()[:,0]}), ignore_index=True)
        
            pooled_pred_val = pooled_pred_val.append( pd.DataFrame({"labels":yZ_train[test],
                                  "probC":prob_val.asnumpy()[:,1],
                                  "probNC":prob_val.asnumpy()[:,0]}), ignore_index=True)
                                  
       
    # plot for cv Train
    figROCs = plt.figure(figsize=(5,5))    
    axaroc = figROCs.add_subplot(1,1,1)
    # add 50% or chance line
    axaroc.plot([0, 1], [0, 1], linestyle='--', lw=1, color='b', alpha=.9)
    # plot mean and +- 1 -std as fill area
    mean_tpr_train = np.mean(tprs_train, axis=0)
    mean_tpr_train[-1] = 1.0
    mean_auc_train = auc(mean_fpr, mean_tpr_train)
    std_auc_train = np.std(aucs_train)
    axaroc.plot(mean_fpr, mean_tpr_train, color='b',
                label=r'cv Train (AUC = %0.2f $\pm$ %0.2f)' % (mean_auc_train, std_auc_train),lw=3, alpha=1)     
    std_tpr = np.std(tprs_train, axis=0)
    tprs_upper = np.minimum(mean_tpr_train + std_tpr, 1)
    tprs_lower = np.maximum(mean_tpr_train - std_tpr, 0)
    axaroc.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2,label=r'$\pm$ 1 std. dev.') 

    # plot for cv val
    mean_tpr_val = np.mean(tprs_val, axis=0)
    mean_tpr_val[-1] = 1.0
    mean_auc_val = auc(mean_fpr, mean_tpr_val)
    std_auc_val = np.std(aucs_val)
    axaroc.plot(mean_fpr, mean_tpr_val, color='g',
                label=r'cv Val (AUC = %0.2f $\pm$ %0.2f)' % (mean_auc_val, std_auc_val),lw=3, alpha=1)     
    std_tpr = np.std(tprs_val, axis=0)
    tprs_upper = np.minimum(mean_tpr_val + std_tpr, 1)
    tprs_lower = np.maximum(mean_tpr_val - std_tpr, 0)
    axaroc.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2,label=r'$\pm$ 1 std. dev.') 
      
    ################
    # plot AUC on heldout set
    ################
    MLP_heldout_iter = mx.io.NDArrayIter(Z_test, None, batch_size)   
    probas_heldout = mlp_model.predict(MLP_heldout_iter)
      
    # plot for axaroc
    # add 50% or chance line
    axaroc.plot([0, 1], [0, 1], linestyle='--', lw=1, color='b', alpha=.9)
    # Compute ROC curve and area the curve
    fpr_test, tpr_test, thresholds_test = roc_curve(yZ_test, probas_heldout.asnumpy()[:, 1])
    auc_test = auc(fpr_test, tpr_test)
    axaroc.plot(fpr_test, tpr_test, color='r',
                label=r'Test (AUC = %0.2f)' % (auc_test),lw=3, alpha=1)     
    # set labels            
    axaroc.set_xlabel('False Positive Rate',fontsize=16)
    axaroc.set_ylabel('True Positive Rate',fontsize=16)
    axaroc.set_title('Unsupervised DEC + cv MLP classifier Zspace dim={}'.format(znum),fontsize=18)
    axaroc.legend(loc="lower right",fontsize=16)
    plt.show()
    
    if(u==3):
        pred_test = pd.DataFrame({"labels":yZ_test,
                          "probC":probas_heldout.asnumpy()[:,1],
                          "probNC":probas_heldout.asnumpy()[:,0]})
                          
znum = 34 , post training DEC results
max auc heal-out test cv, at knum centroids 0.762068965517
9
auc train cv, at knum centroids 0.802898698524
std auc train cv, at knum centroids 0.0313727355097
auc val cv, at knum centroids 0.805956625074
std of max auc val cv, at knum centroids 0.116175631374
Loading autoencoder of znum = 34, mu = 9 , post training DEC results
DEC train init AUC = 0.625757575758
DEC train max meanAuc_cv = 0.719273618538 $\pm$ 0.095940320071
DEC validation AUC at max meanAuc_cv = 0.76275862069

... MLP fully coneected layer trained on Z_train tested on Z_test
znum = 52 , post training DEC results
max auc heal-out test cv, at knum centroids 0.748965517241
3
auc train cv, at knum centroids 0.782778263403
std auc train cv, at knum centroids 0.0259015478701
auc val cv, at knum centroids 0.782761437908
std of max auc val cv, at knum centroids 0.106860090426
Loading autoencoder of znum = 52, mu = 3 , post training DEC results
DEC train init AUC = 0.607546048723
DEC train max meanAuc_cv = 0.671732026144 $\pm$ 0.120839844644
DEC validation AUC at max meanAuc_cv = 0.591724137931

... MLP fully coneected layer trained on Z_train tested on Z_test
znum = 104 , post training DEC results
max auc heal-out test cv, at knum centroids 0.753793103448
10
auc train cv, at knum centroids 0.753316822067
std auc train cv, at knum centroids 0.0286479165015
auc val cv, at knum centroids 0.758489304813
std of max auc val cv, at knum centroids 0.113960009332
Loading autoencoder of znum = 104, mu = 10 , post training DEC results
DEC train init AUC = 0.668820558526
DEC train max meanAuc_cv = 0.716154188948 $\pm$ 0.0909750155404
DEC validation AUC at max meanAuc_cv = 0.753793103448

... MLP fully coneected layer trained on Z_train tested on Z_test
znum = 261 , post training DEC results
max auc heal-out test cv, at knum centroids 0.775172413793
3
auc train cv, at knum centroids 0.778359071484
std auc train cv, at knum centroids 0.0231004683231
auc val cv, at knum centroids 0.784915329768
std of max auc val cv, at knum centroids 0.0964151659431
Loading autoencoder of znum = 261, mu = 3 , post training DEC results
DEC train init AUC = 0.673262032086
DEC train max meanAuc_cv = 0.732642602496 $\pm$ 0.0931672775493
DEC validation AUC at max meanAuc_cv = 0.775172413793

... MLP fully coneected layer trained on Z_train tested on Z_test
In [15]:
pooled_pred_val
Out[15]:
labels probC probNC
0 0 0.168150 0.831850
1 0 0.252249 0.747751
2 1 0.852703 0.147297
3 1 0.158406 0.841594
4 0 0.234078 0.765922
5 1 0.264423 0.735577
6 0 0.126331 0.873669
7 0 0.104256 0.895744
8 0 0.155487 0.844513
9 0 0.131976 0.868024
10 0 0.181444 0.818556
11 0 0.137233 0.862767
12 1 0.337790 0.662210
13 0 0.114876 0.885124
14 1 0.869022 0.130978
15 1 0.482328 0.517672
16 0 0.132510 0.867490
17 0 0.112781 0.887219
18 1 0.761882 0.238118
19 1 0.671245 0.328755
20 1 0.462532 0.537468
21 1 0.514066 0.485934
22 0 0.155968 0.844032
23 0 0.092322 0.907678
24 1 0.651779 0.348221
25 0 0.136971 0.863029
26 0 0.184948 0.815052
27 1 0.160638 0.839362
28 0 0.243174 0.756826
29 0 0.218294 0.781707
... ... ... ...
268 0 0.102402 0.897598
269 0 0.142651 0.857349
270 0 0.324535 0.675465
271 0 0.134824 0.865176
272 0 0.146287 0.853713
273 0 0.733112 0.266888
274 0 0.137017 0.862983
275 0 0.402253 0.597747
276 0 0.180981 0.819019
277 1 0.255621 0.744379
278 1 0.740978 0.259022
279 1 0.151876 0.848124
280 1 0.105827 0.894173
281 1 0.092056 0.907944
282 1 0.515520 0.484480
283 1 0.594362 0.405638
284 1 0.297177 0.702823
285 1 0.149569 0.850431
286 1 0.188766 0.811234
287 1 0.164938 0.835062
288 0 0.122843 0.877157
289 0 0.109546 0.890454
290 1 0.898482 0.101518
291 0 0.551291 0.448709
292 0 0.140002 0.859998
293 1 0.255348 0.744652
294 1 0.859552 0.140448
295 1 0.146430 0.853570
296 0 0.197563 0.802438
297 1 0.164457 0.835543

298 rows × 3 columns

In [16]:
################
# save pooled model probabilties
pooled_pred_train.to_csv('datasets/exp3_pooled_pred_train.csv', header=True, index=False)
pooled_pred_val.to_csv('datasets/exp3_pooled_pred_val.csv', header=True, index=False)
pred_test.to_csv('datasets/exp3_pred_test.csv', header=True, index=False)